Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

ExtReMLapin · 2025-08-11T16:25:21Z

There was this issue that we would not use tool_choice at required with reasoning because of forced tool call imposed by grammar.

Grammar now allows the model to think first.

Reasoning = big brain

Tool calling = strong arms

Now you can be very smart and very strong

Fixes #15247

ExtReMLapin · 2025-08-11T16:31:44Z

All am I sure is that it fixes issues with Qwen3 + reasoning (enabled or disabled)+ tool calling.
I fear it also has to be implemented for others reasoning parsers (non hermes)

CC hermes2 contributor : @ochafik

ExtReMLapin · 2025-08-24T13:47:59Z

Back at the office on tuesday, re-reading the PR i might reconsider the logic around

thinking_forced_open

…equired

…already opened grammar)

…r accepting piece:`

ExtReMLapin · 2025-08-25T19:52:49Z

For a future PR, the following functions needs the same kind of patch :

common_chat_params_init_granite : same as qwen3, but i'm not sure as the thinking tags are not always there
common_chat_params_init_command_r7b : "<|START_THINKING|>", "<|END_THINKING|>"
common_chat_params_init_deepseek_r1 : same as qwen3
common_chat_params_init_gpt_oss ???

ready for review @ggerganov

Not sure exactly who I should ping as ochafik seems to be busy this week

common/chat.cpp

…it, just disable it, don't GGML_ABORT

… it's own issue

…equired

ExtReMLapin · 2025-09-23T11:08:34Z

Not sure who I should ping

@slaren

ggerganov · 2025-09-24T07:20:55Z

Same comment as #15019 (comment)

This is a smaller change, so I can take a look and merge this, but prefer if we have someone who would take over this part of the code.

ExtReMLapin · 2025-09-24T07:38:57Z

Got it, thanks for keeping me updated !

ExtReMLapin · 2025-10-14T12:25:53Z

If you are being held hostage, blink twice @ochafik

…equired

ochafik

Thanks @ExtReMLapin ! Looks very promising :-)

Could you add some tests in test-chat.cpp (will need to force git add models/templates/qwen3-something.jinja, see models/templates/README.md )

ochafik · 2025-10-30T01:22:28Z

common/chat.cpp

+            builder.add_rule("thinking-start", "\"<think>\"");
+            builder.add_rule("thinking-content", "( [^<] | \"<\" [^/] | \"</\" [^t] | \"</t\" [^h] | \"</th\" [^i] | \"</thi\" [^n] | \"</thin\" [^k] | \"</think\" [^>] )*");
+            builder.add_rule("thinking-end", "\"</think>\" space");
+
+            //thinking grammar logic depending on if thinking_forced_open was to true (so already opened (and maybe closed)) and if thinking is even allowed
+            std::string thinking_grammar_logic = ""; // thinking tag was closed or not supported/wanted
+            if (extra_context["enable_thinking"]) {
+                data.grammar_triggers.push_back({
+                    COMMON_GRAMMAR_TRIGGER_TYPE_WORD,
+                    data.thinking_forced_open ? "</think>" : "<think>"
+                });
+                if (data.thinking_forced_open) {
+                    //thinking tag was already opened by used so we don't need to add it again
+                    thinking_grammar_logic = "(thinking-content thinking-end) ";
+                }
+                else
+                {
+                    thinking_grammar_logic = "(thinking-start thinking-content thinking-end) ";
+                }
+            }
+
+
+            builder.add_rule("root", thinking_grammar_logic + (inputs.parallel_tool_calls ? "(" + tool_call + ")+" : tool_call));


Here's (untested) code that addresses some issues:

if tool_choice isn't required and we have a </think> (if forced open) or a <think>, we're triggering the lazy grammar and requiring a tool call after it.

rules (apart from root) can be renamed when colliding, gotta take the return value of add_rule

Suggested change

builder.add_rule("thinking-start", "\"<think>\"");

builder.add_rule("thinking-content", "( [^<] | \"<\" [^/] | \"</\" [^t] | \"</t\" [^h] | \"</th\" [^i] | \"</thi\" [^n] | \"</thin\" [^k] | \"</think\" [^>] )*");

builder.add_rule("thinking-end", "\"</think>\" space");

//thinking grammar logic depending on if thinking_forced_open was to true (so already opened (and maybe closed)) and if thinking is even allowed

std::string thinking_grammar_logic = ""; // thinking tag was closed or not supported/wanted

if (extra_context["enable_thinking"]) {

data.grammar_triggers.push_back({

COMMON_GRAMMAR_TRIGGER_TYPE_WORD,

data.thinking_forced_open ? "</think>" : "<think>"

});

if (data.thinking_forced_open) {

//thinking tag was already opened by used so we don't need to add it again

thinking_grammar_logic = "(thinking-content thinking-end) ";

}

else

{

thinking_grammar_logic = "(thinking-start thinking-content thinking-end) ";

}

}

builder.add_rule("root", thinking_grammar_logic + (inputs.parallel_tool_calls ? "(" + tool_call + ")+" : tool_call));

// thinking grammar logic depending on if thinking_forced_open was to true (so already opened (and maybe closed)) and if thinking is even allowed

if (extra_context["enable_thinking"]) {

data.grammar_triggers.push_back({

COMMON_GRAMMAR_TRIGGER_TYPE_WORD,

data.thinking_forced_open ? "</think>" : "<think>"

});

std::string prelude = "";

if (!data.thinking_forced_open) {

prelude = builder.add_rule("think-start", "\"<think>\"");

}

prelude += " ";

prelude += builder.add_rule("think-content", "( [^<] | \"<\" [^/] | \"</\" [^t] | \"</t\" [^h] | \"</th\" [^i] | \"</thi\" [^n] | \"</thin\" [^k] | \"</think\" [^>] )*");

prelude += " ";

prelude += builder.add_rule("think-end", "\"</think>\" space");

prelude += " ";

builder.add_rule("root", prelude + "(" + tool_call + ")" + (inputs.parallel_tool_calls ? "*" : "?"));

} else {

builder.add_rule("root", inputs.parallel_tool_calls ? "(" + tool_call + ")+" : tool_call);

}

Update chat.cpp to support (at least) qwen3 + tool_choice = required

c6c4f7c

ExtReMLapin marked this pull request as ready for review August 11, 2025 16:31

ExtReMLapin mentioned this pull request Aug 11, 2025

Misc. bug: [chat] (hermes 2) Impossible de to use both tool_choice: "required" and reasoning #15247

Closed

ExtReMLapin added 2 commits August 11, 2025 23:21

refactored changes to follow string tern op

42937a5

fixing editorconfig-checker CI (tailing whitespace)

5796938

ExtReMLapin mentioned this pull request Aug 12, 2025

[Feature]: support tool and reasoning together vllm-project/vllm#14429

Closed

1 task

broadbit-hu mentioned this pull request Aug 21, 2025

Eval bug: Nondeterministic output with ROCm backend despite zero temperature #14727

Open

ExtReMLapin and others added 5 commits August 25, 2025 18:19

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

de07a43

…equired

hermes 2 pro tool calling, better support for thinking (thinking tag …

79e4a7b

…already opened grammar)

qwen hermes tool calling : fixed grammar rules names

dbae921

fixed really weird grammar crash `Unexpected empty grammar stack afte…

86493dd

…r accepting piece:`

also apply the hotcrashfix here, just in case

bb5e352

ExtReMLapin commented Aug 25, 2025

View reviewed changes

common/chat.cpp Outdated Show resolved Hide resolved

Pierre F and others added 3 commits August 26, 2025 12:30

reverted changes done to grammar_lazy for hermes 2

6d5f561

if there is enable_thinking enabled but hermes model doesn't support …

352274e

…it, just disable it, don't GGML_ABORT

fix thinking-content eating closing think tag | ref ggml-org#8953

0e55830

ExtReMLapin marked this pull request as draft August 26, 2025 12:59

removed ? from grammar as it doesn't crash on linux, probably worth…

e62cd70

… it's own issue

ExtReMLapin mentioned this pull request Aug 27, 2025

Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call> #15608

Closed

ExtReMLapin and others added 2 commits August 28, 2025 20:37

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

2f28a1c

…equired

fixed crash with "auto" mode, trigger was missing

310701b

ExtReMLapin marked this pull request as ready for review August 29, 2025 12:49

This was referenced Aug 29, 2025

Model: Seed OSS thinking + tool call support #15552

Merged

feat: nemotron thinking & toolcalling support #15676

Merged

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

5688afa

…equired

ggerganov requested a review from ochafik September 24, 2025 07:19

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

dc75a57

…equired

ExtReMLapin mentioned this pull request Oct 20, 2025

[Bug]: Hybrid Attention models broken after switching to flashinfer 0.4 (tested on Granite 4.0 H, Qwen3-Next, Jamba-3B, Nemotron-H-8b) vllm-project/vllm#26936

Open

1 task

ochafik suggested changes Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

ExtReMLapin commented Aug 11, 2025 •

edited

Loading

Uh oh!

ExtReMLapin commented Aug 11, 2025

Uh oh!

ExtReMLapin commented Aug 24, 2025

Uh oh!

ExtReMLapin commented Aug 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

ExtReMLapin commented Sep 23, 2025

Uh oh!

ggerganov commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Oct 14, 2025

Uh oh!

ochafik left a comment •

edited

Loading

Uh oh!

ochafik Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

Are you sure you want to change the base?

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

Conversation

ExtReMLapin commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ExtReMLapin commented Aug 11, 2025

Uh oh!

ExtReMLapin commented Aug 24, 2025

Uh oh!

ExtReMLapin commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ExtReMLapin commented Sep 23, 2025

Uh oh!

ggerganov commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Oct 14, 2025

Uh oh!

ochafik left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ochafik Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ExtReMLapin commented Aug 11, 2025 •

edited

Loading

ExtReMLapin commented Aug 25, 2025 •

edited

Loading

ochafik left a comment •

edited

Loading